Search CORE

61 research outputs found

Robust distance correlation for variable screening

Author: Ke Hongjie
Ma Tianzhou
Ren Zhao
Publication venue
Publication date: 26/12/2022
Field of study

High-dimensional data are commonly seen in modern statistical applications, variable selection methods play indispensable roles in identifying the critical features for scientific discoveries. Traditional best subset selection methods are computationally intractable with a large number of features, while regularization methods such as Lasso, SCAD and their variants perform poorly in ultrahigh-dimensional data due to low computational efficiency and unstable algorithm. Sure screening methods have become popular alternatives by first rapidly reducing the dimension using simple measures such as marginal correlation then applying any regularization methods. A number of screening methods for different models or problems have been developed, however, none of the methods have targeted at data with heavy tailedness, which is another important characteristics of modern big data. In this paper, we propose a robust distance correlation (``RDC'') based sure screening method to perform screening in ultrahigh-dimensional regression with heavy-tailed data. The proposed method shares the same good properties as the original model-free distance correlation based screening while has additional merit of robustly estimating the distance correlation when data is heavy-tailed and improves the model selection performance in screening. We conducted extensive simulations under different scenarios of heavy tailedness to demonstrate the advantage of our proposed procedure as compared to other existing model-based or model-free screening procedures with improved feature selection and prediction performance. We also applied the method to high-dimensional heavy-tailed RNA sequencing (RNA-seq) data of The Cancer Genome Atlas (TCGA) pancreatic cancer cohort and RDC was shown to outperform the other methods in prioritizing the most essential and biologically meaningful genes

arXiv.org e-Print Archive

Bayesian indicator variable selection of multivariate response with heterogeneous sparsity for multi-trait fine mapping

Author: Canida Travis
Ke Hongjie
Ma Tianzhou
Publication venue
Publication date: 26/12/2022
Field of study

Variable selection has been played a critical role in contemporary statistics and scientific discoveries. Numerous regularization and Bayesian variable selection methods have been developed in the past two decades for variable selection, but they mainly target at only one response. As more data being collected nowadays, it is common to obtain and analyze multiple correlated responses from the same study. Running separate regression for each response ignores their correlation thus multivariate analysis is recommended. Existing multivariate methods select variables related to all responses without considering the possible heterogeneous sparsity of different responses, i.e. some features may only predict a subset of responses but not the rest. In this paper, we develop a novel Bayesian indicator variable selection method in multivariate regression model with a large number of grouped predictors targeting at multiple correlated responses with possibly heterogeneous sparsity patterns. The method is motivated by the multi-trait fine mapping problem in genetics to identify the variants that are causal to multiple related traits. Our new method is featured by its selection at individual level, group level as well as specific to each response. In addition, we propose a new concept of subset posterior inclusion probability for inference to prioritize predictors that target at subset(s) of responses. Extensive simulations with varying sparsity and heterogeneity levels and dimension have shown the advantage of our method in variable selection and prediction performance as compared to existing general Bayesian multivariate variable selection methods and Bayesian fine mapping methods. We also applied our method to a real data example in imaging genetics and identified important causal variants for brain white matter structural change in different regions.Comment: 29 pages, 3 figure

arXiv.org e-Print Archive

MEMD-ABSA: A Multi-Element Multi-Domain Dataset for Aspect-Based Sentiment Analysis

Author: Cai Hongjie
Li Ke
Liu Shijie
Song Nan
Wang Zengzhi
Wu Siwei
Xia Rui
Xie Qiming
Yu Jianfei
Zhao Qiankun
Publication venue
Publication date: 29/06/2023
Field of study

Aspect-based sentiment analysis is a long-standing research interest in the field of opinion mining, and in recent years, researchers have gradually shifted their focus from simple ABSA subtasks to end-to-end multi-element ABSA tasks. However, the datasets currently used in the research are limited to individual elements of specific tasks, usually focusing on in-domain settings, ignoring implicit aspects and opinions, and with a small data scale. To address these issues, we propose a large-scale Multi-Element Multi-Domain dataset (MEMD) that covers the four elements across five domains, including nearly 20,000 review sentences and 30,000 quadruples annotated with explicit and implicit aspects and opinions for ABSA research. Meanwhile, we evaluate generative and non-generative baselines on multiple ABSA subtasks under the open domain setting, and the results show that open domain ABSA as well as mining implicit aspects and opinions remain ongoing challenges to be addressed. The datasets are publicly released at \url{https://github.com/NUSTM/MEMD-ABSA}

arXiv.org e-Print Archive

Evaluation of Changes in the Characteristic Flavor of Ultra-high Temperature Sterilized Milk under the Effects of Temperature and Light

Author: LI Zepeng DENG Yuming, ZENG Ke, XI Hongjie, LU Lixin, SONG Lijun
Publication venue: China Food Publishing Company
Publication date: 01/08/2023
Field of study

In order to study changes in the characteristic flavor of ultra-high temperature sterilized (UHT) milk under the influence of storage temperature and light, headspace solid phase microextraction (SPME) combined with gas chromatography-mass spectrometry (GC-MS) was used to detect the volatile flavor components of the product. Descriptive sensory evaluation, orthogonal partial least squares-discriminant analysis (OPLS-DA) and entropy weight method were used to determine the relationship between major characteristic flavors and characteristic substances. The effects of temperature and light flux on the flavor changes of different formulations of UHT milk were analyzed, and a model for comprehensive analysis of the characteristic flavors of UHT milk was developed based on the effects of initial unsaturated fatty acid content, temperature and light flux. The results of this research provide support for the quality control of different formulations of UHT milk

Directory of Open Access Journals

Psychometric assessment of HIV/STI sexual risk scale among MSM: A Rasch model approach

Author: A Tennant
C Beyrer
C Fox
CL Mattson
CW Kahler
D Andrich
DD Heckathorn
EV Smith Jr
EV Smith Jr
F Franchignoni
G Rasch
GN Brahmam
H Liu
Hongjie Liu
Hui Liu
I Marais
JA Bauermeister
Jian Li
JM Linacre
KE Schroder
M Fendrich
Q He
S Fergus
SH Kook
TG Bond
Tiejian Feng
Y Guo
Yumao Cai
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Little research has assessed the degree of severity and ordering of different types of sexual behaviors for HIV/STI infection in a measurement scale. The purpose of this study was to apply the Rasch model on psychometric assessment of an HIV/STI sexual risk scale among men who have sex with men (MSM). Methods A cross-sectional study using respondent driven sampling was conducted among 351 MSM in Shenzhen, China. The Rasch model was used to examine the psychometric properties of an HIV/STI sexual risk scale including nine types of sexual behaviors. Results The Rasch analysis of the nine items met the unidimensionality and local independence assumption. Although the person reliability was low at 0.35, the item reliability was high at 0.99. The fit statistics provided acceptable infit and outfit values. Item difficulty invariance analysis showed that the item estimates of the risk behavior items were invariant (within error). Conclusions The findings suggest that the Rasch model can be utilized for measuring the level of sexual risk for HIV/STI infection as a single latent construct and for establishing the relative degree of severity of each type of sexual behavior in HIV/STI transmission and acquisition among MSM. The measurement scale provides a useful measurement tool to inform, design and evaluate behavioral interventions for HIV/STI infection among MSM.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

VCU Scholars Compass

Epidemic characteristics, high-risk townships and space-time clusters of human brucellosis in Shanxi Province of China, 2005–2014

Author: A Mollalo
AS Dean
AS Dean
B Cui
C Qiulan
CHENG Xiao-ping
China Center for Disease Control and Prevention
D Wang
Di Mu
G Pappas
G Pappas
GF Araj
Hang Zhou
Hongjie Yu
J Kunda
J Zhang
J Zhang
JD Chen
JH Zhang
K Skalsky
L Fang
L Yang
LH Duczmal
LU Rui-li
M Kulldorff
M Kulldorff
M Kulldorff
MA Jia-qi
MA Jia-qi
MP Franco
P Jia
Q Hou
Qiulan Chen
R Abdullayev
RA Greenfield
RM Traxler
S Al Dahouk
S Al Dahouk
S Al Dahouk
S Chen
S Dahouk Al
S p Fan
Shengjie Lai
SM Alavi
T Man
Weizhong Yang
Wenwu Yin
WY Zhang
Y Ke
Y Wang
Y Zhao
Y-f Bai
YJ Li
Yu Li
Z Chen
Z Zhong
ZHANG Yanhong
ZHENG Yang
Zhongjie Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Changing Geographic Patterns and Risk Factors for Avian Influenza A(H7N9) Infections in Humans, China

Author: Artois
Bjørnstad
Cowling
Crase
Elith
Fang
Filip Claes
Friedman
Fuller
Fusheng Guo
Gilbert
Hongjie Yu
Hui Jiang
Jean Artois
Jiandong Zheng
Juanjuan Zhang
Kalthoff
Ke
Lam
Lam
Lehner
Li
Liu
Luzhao Feng
Ma
Madhur S. Dhingra
Marius Gilbert
Morgan Pearcy
Nicolas
Pantin-Jackwood
Qin
Robinson
Shengjie Lai
Simon I. Hay
Sophie von Dobschuetz
Timothy Robinson
Uyeki
Vincent Martin
Virlogeux
Wang
Wantanee Kalpravidh
Wu
Wu
Xiang
Xiang
Xiangming Xiao
Xiling Wang
Xu
Yang
Yangni He
Ying Qin
Yu
Yu
Yu
Yuan
Yujing Shi
Zhibin Peng
Zhou
Zhou
Zhou
Zhu
Zhu
Publication venue: 'Centers for Disease Control and Prevention (CDC)'
Publication date
Field of study

Crossref

Assessing trend and variation of Arctic sea-ice extent during 1979-2012 from a latitude perspective of ice edge

Author: Ke Changqing
Xia Wentao
Xie Hongjie
Publication venue: 'Co-Action Publishing'
Publication date: 01/09/2014
Field of study

Arctic sea-ice extent (in summer) has been shrinking since the 1970s. However, we have little knowledge of the detailed spatial variability of this shrinking. In this study, we examine the (latitudinal) ice extent along each degree of longitude, using the monthly Arctic ice index data sets (1979–2012) from the National Snow and Ice Data Center. Statistical analysis suggests that: (1) for summer months (July–October), there was a 34-year declining trend in sea-ice extent at most regions, except for the Canadian Arctic Archipelago, Greenland and Svalbard, with retreat rates of 0.0562–0.0898 latitude degree/year (or 6.26–10.00 km/year, at a significance level of 0.05); (2) for sea ice not geographically muted by the continental coastline in winter months (January–April), there was a declining trend of 0.0216–0.0559 latitude degree/year (2.40–6.22 km/year, at a significance level of 0.05). Regionally, the most evident sea-ice decline occurred in the Chukchi Sea from August to October, Baffin Bay and Greenland Sea from January to May, Barents Sea in most months, Kara Sea from July to August and Laptev Sea and eastern Siberian Sea in August and September. Trend analysis also indicates that: (1) the decline in summer ice extent became significant (at a 0.05 significance level) since 1999 and (2) winter ice extent showed a clear changing point (decline) around 2000, becoming statistically significant around 2005. The Pacific–Siberian sector of the Arctic accounted for most of the summer sea-ice decline, while the winter recovery of sea ice in the Atlantic sector tended to decrease.Keywords: NSIDC ice index; Arctic; sea-ice extent; ice-edge latitude.(Published: 11 September 2014)Citation: Polar Research 2014, 33, 21249, http://dx.doi.org/10.3402/polar.v33.2124

Directory of Open Access Journals

Polar Research (E-Journal)